

This Page Is Inserted by IFW Operations  
and is not a part of the Official Record

## BEST AVAILABLE IMAGES

---

Defective images within this document are accurate representations of the original documents submitted by the applicant.

Defects in the images may include (but are not limited to):

- BLACK BORDERS
- TEXT CUT OFF AT TOP, BOTTOM OR SIDES
- FADED TEXT
- ILLEGIBLE TEXT
- SKEWED/SLANTED IMAGES
- COLORED PHOTOS
- BLACK OR VERY BLACK AND WHITE DARK PHOTOS
- GRAY SCALE DOCUMENTS

**IMAGES ARE BEST AVAILABLE COPY.**

**As rescanning documents *will not* correct images,  
please do not report the images to the  
Image Problem Mailbox.**

---

**THIS PAGE BLANK (USPTO)**



Europäisches Patentamt  
European Patent Office  
Office européen des brevets

(11) Publication number:

0 189 943  
A2

(12)

## EUROPEAN PATENT APPLICATION

(21) Application number: 86101338.1

(51) Int. Cl.: G 06 F 15/68

(22) Date of filing: 31.01.86

(30) Priority: 01.02.85 JP 16553/85  
27.09.85 JP 214163/85  
20.12.85 JP 285576/85

(43) Date of publication of application:  
06.08.86 Bulletin 86/32

(64) Designated Contracting States:  
CH DE FR GB IT LI NL SE

(71) Applicant: HITACHI, LTD.  
6, Kanda Surugadai 4-chome Chiyoda-ku  
Tokyo 100(JP)

(72) Inventor: Miura, Shuichi  
Yuhoryo 305, 20-3 Ayukawacho-6-chome  
Hitachi-shi(JP)

(72) Inventor: Kobayashi, Yoshiki  
24-5, Mikanoharacho-2-chome  
Hitachi-shi(JP)

(72) Inventor: Fukushima, Tadashi  
23-5, Haneyamacho-1-chome  
Hitachi-shi(JP)

(72) Inventor: Okuyama, Yoshiyuki  
Tozawaryo 901, 10-12 Suehirocho-3-chome  
Hitachi-shi(JP)

(72) Inventor: Kato, Takeshi  
5-19, Higashionumacho-2-chome  
Hitachi-shi(JP)

(72) Inventor: Hirasawa, Kotaro  
10-7, Kanesawacho-7-chome  
Hitachi-shi(JP)

(72) Inventor: Asada, Kazuyoshi  
15-8-1, Suwacho-4-chome  
Hitachi-shi(JP)

(74) Representative: Strehl, Schübel-Hopf, Groening, Schulz  
Widenmayerstrasse 17 Postfach 22 03 45  
D-8000 München 22(DE)

(54) Parallel image processor.

(57) An LSI parallel image processor in which line buffers (20-i) and data-flow switching circuits (70) each requiring a larger amount of hardware in the prior art are incorporated in an LSI circuit, the image data delayed by the line buffers (20-i) is output from an image data output port (55), shift registers (31-i) each having a variable number of steps for preserving local image regions are intermittently shifted-in in accordance with applied clocks, and the contents of the shift registers (31-i) are sequentially read out.

EP 0 189 943 A2

FIG. 1



1 BACKGROUND OF THE INVENTION

This invention relates to a processor for parallel image processing which performs local neighboring (Kernel) image processings such as spacial convolution operation.

The image processing for processing image data is classified into a preprocessing, a feature extraction processing, judgement processing, etc., and the parallel image processing processor according to this invention 10 is suitable to mainly perform the preprocessing.

This preprocessing is desired to be performed by an image processor which is versatile and allows a high speed processing. However, since the image data to be processed are two-dimensionally extended, it is 15 difficult to parallelly process all the image data. Therefore, the parallel processing is often performed for the operations among local neighboring image data such as spacial convolution operation which is intended for noise reduction and edge enhancement. In order to process such 20 local neighboring image data, there has been proposed an LSI circuit of a local parallel type image processor which is disclosed in Japanese Patent Unexamined Publication No. 59-146,366 (corresponding to U.S. Application Serial No. 578,508) and U.S.P. No. 4,550,437. This 25 circuit was large-scale integrated using as a main module

1 a parallel operation circuit which operates parts of the  
local neighboring data in parallel; plural main modules  
are arranged or one main module is subjected to a time  
division processing to extend the size of the local image  
5 region, thereby performing the parallel processing of  
local neighboring operations at a high speed and versati-  
lity.

Namely, this processor performs an  $m \times n$  ( $m, n$ :  
integer) local parallel image processing in such a way  
10 as that (1)  $m$  main modules each having arithmetic units  
(processor elements, PE's) are arranged and perform the  
process in one machine cycle or (2) a single main module  
having  $n$  PE's are used in a time division manner and  
performs the processing in  $m$  machine cycles.

15 Where in the above prior art, plural main modules  
are used to perform an image processing, line buffer  
circuits, are employed, as externally equipped circuits,  
for supplying in parallel the image data to the respective  
main modules. Therefore, once the wiring is made, the  
20 local image region which permits a parallel processing  
is disadvantageously fixed. Moreover, additional line  
buffer circuits must be employed for expanding the local  
neighboring region. For example, where a  $3 \times 3$  local  
parallel operation is performed with an operating frequency  
25 of 6 MHz for an image of  $256 \times 256$  pixels with each pixel  
data indicated by 8 bits, a 4 K bit high speed memory or  
shift register operating with a frequency of 6 MHz is  
required so that the required amount of hardware becomes

1 large.

0189943

On the other hand, where the time division processing is carried out for the image processing, the above line buffer circuit is not required. However, the 5 image data must be supplied to the main module by means of a particular scanning method of a stick scanning. In order to convert the ordinary raster-scanned image data into the stick-scanned one, a larger amount of hardware is required than the above line buffer circuit.

#### 10 SUMMARY OF THE INVENTION

An object of this invention is to provide a parallel image processor which is capable of obviating the above disadvantages of the prior art and of easily expanding the local image region to be subjected to a 15 local neighboring operation with a smaller amount of hardware.

Another object of this invention is to provide a parallel image processor which can be flexibly applied to several local image regions by means of the same 20 hardware construction.

These objects can be attained by an LSI'ed parallel image processor in which line buffers and data-flow switching circuits requiring a larger amount of hardware in the prior art are incorporated into an LSI circuit, 25 the image data delayed by the line buffers is output from an image data output port, shift registers each having a variable number of steps for preserving local image

- 4 -

1 regions are intermittently shifted-in in accordance with  
applied clocks and the contents of the shift registers  
are sequentially read out.

Namely, in accordance with this invention, the  
5 amount of hardware can be reduced since the line buffer  
circuits are incorporated in LSI and the delayed image  
data is output, and the size of the local image region  
can be easily expanded only by the connection of LSI's.  
Further, the data-flow switching circuits are also in-  
10 corporated as peripheral circuits to operate the step-  
number variable shift registers in a time division manner  
so that the parallel image processor according to this  
invention can be freely adapted to various local image  
regions without altering the external wirings.

15 The above and other objects and features of  
this invention will be apparent from the following descrip-  
tion taken in conjunction with the accompanying drawings,  
in which like reference characters refer to like elements  
in the several views.

20 BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a block diagram showing one arrange-  
ment of the main module used in the parallel image processor  
according to one embodiment of this invention;

25 Fig. 2 is a view for explaining a local  
parallel operation system;

Fig. 3 is a block diagram of a parallel operation  
section inside the main module;

1 Fig. 4 is a block diagram showing a unifying circuit inside the main module;

Fig. 5 is a block diagram for explaining the examples of the operation of the unifying circuit;

5 Fig. 6 is a block diagram showing the arrangement of a line buffer inside the main module;

Fig. 7 is a block diagram showing one arrangement of a step-number-variable shift register inside the main module;

10 Fig. 8 is a circuit diagram of each of the cells in the step-number-variable shift register;

Fig. 9 is a view for explaining the operation of the variable step shift register of Fig. 7;

15 Fig. 10 is a timing chart of the step-number-variable shift register of Fig. 7;

Fig. 11 is a block diagram showing another arrangement of a step-number-variable shift register inside the main module;

20 Fig. 12 is a timing chart of the step-number-variable shift register of Fig. 11;

Fig. 13 is a block diagram showing still another arrangement of a step-number-variable shift register inside the main module;

25 Fig. 14 is a view for explaining the operation of the step-number-varaible shift register of Fig. 13;

Fig. 15 is a timing chart of the step-number-variable shift register of Fig. 13;

Figs. 16 to 18 are block diagrams showing

1 examples of the application of the main module, respec-  
tively;

Fig. 19 is a block diagram showing an arrangement  
of the main module used in the parallel image processor  
5 according to another embodiment of this invention;

Figs. 20 to 22 are block diagrams showing  
examples of the application of the main module, respec-  
tively;

Fig. 23 is a block diagram showing an arrange-  
10 ment of the main module used in the parallel image processor  
according to still another embodiment of this invention;  
and

Figs. 24 to 27 are block diagrams showing  
examples of the application of the main module, respec-  
15 tively.

#### DESCRIPTION OF THE PREFERRED EMBODIMENTS

Several embodiments of this invention will be  
explained hereinafter referring to the drawings.

Fig. 2 shows a local parallel operation system  
20 for performing a  $3 \times 3$  ( $m \times n$ ;  $m, n$ : integer) local  
neighboring image processing, which is a main operation  
of the image preprocessing operation, at a high speed.

It is assumed that an input image 1 to be processed is a  
gray-scale image consisting of  $10 \times 10$  image data, and

25 the image is raster-scanned in the order of ①, ②,  
..... as shown in Fig. 2. Fig. 2 shows the state when the  
raster-scanning has been finished till the image data ⑬.

- 7 -

1       The image data raster-scanned from input image  
1 are fed to a register 31-00 and a line buffer 20-0.

The image data fed to register 31-00 are shifted to  
registers 31-01 and 31-02 in order. The image data  
5 fed to line buffer 20-0 are delayed by the time required  
to scan one line of the image data and fetched therefrom.

The image data fetched from line buffer 20-0 are  
fed to a register 31-10 and a line buffer 20-1. The image  
data fed to register 31-10 are shifted to registers 31-11  
10 and 31-12. The image data fetched to line buffer 20-1  
are delayed by the time required to scan one line of the  
image data and fetched therefrom.

The image data fetched from line buffer 20-1  
are fed to a register 31-20. The image data fed to  
15 register 31-20 are shifted to registers 31-21 and 31-22  
in order.

Thus, when the image data ⑬ is register 31-00  
and line buffer 20-0, 3 x 3 local neighboring image data  
⑪, ⑫, ⑬, ⑭, ⑮, ⑯, ⑰, ⑱ and ⑲  
20 with the image data ⑳ centered are simultaneously  
stored in nine registers 31, respectively. Therefore, by  
employing the same number of arithmetic units as that of  
registers 31, the image data in the respective registers  
31 can be parallelly operated so that the high speed  
25 processing thereof can be realized.

Fig. 1 shows an arrangement of the main module  
10 of the parallel image processor according to one embodiment  
of this invention which is capable of implementing

1 the above local parallel operation system. Main module 10  
comprises an image data input port 54 from which the  
image data are input, an image data output port 55 from  
which the image data delayed inside main module 10 are  
5 output, an operation data input port 64 from which the  
operation result from another main module 1 is input, and  
an operation result output port 65 from which the internal  
processing result is output.

The image data raster-scanned from input image  
10 1 are fed to a step-number-variable shift register (VSR)  
31-0, line buffer 20-0, and a selector 70 through image  
data input port 54. Line buffer 20-0 delays the input  
image data by the time required to scan one line of the  
image data, and delivers the delayed image data to a  
15 selector 33-0, line buffer 20-1, and selector 70.

Line buffer 20-1 delays the image data fed from line  
buffer 20-0 by the time required to scan further one line  
of the image data and deliveres the delayed image data  
to selectors 33-1 and 70.

20 Selector 70 selects one of the image data from  
image data input port 54, the ouput from image data line  
buffer 20-0 and the ouput from line buffer 20-1 in accord-  
ance with a control signal from a control circuit 21, and  
outputs it from image data output port 55. Namely, one  
25 of the image data delayed from the input image data by  
0, 1 and 2 lines of the data is selected by selector 70  
and output from image data output port 55 (Incidentally,  
the output from image data output port 55 is an input

1 image data of a next main module 10 when plural main  
modules are employed).

VSR 31-0 carries out a shifting operation in  
accordance with a control signal from control 21 and  
5 delivers the image data to a parallel operating section  
30 and selector 33-0.

Selector 33-0 selects either one of the output  
from line buffer 20-0 and the output from VSR 31-0 in  
accordance with a control signal from control circuit 21  
10 and delivers them to VSR 31-1, VSR-31-1 carries out a  
shifting operation in the same manner as in VSR 31-0  
and supplies the image data to parallel operating section  
30 and selector 33-1.

Selector 33-1 selects either one of the output  
15 from line buffer 20-1 and the output from VSR 31-1 in  
the same manner of control as in selector 33-0 and supplies  
them to VSR 31-2. VSR 31-2 carries out a shifting opera-  
tion in the same manner as in VSR 31-0 and supplies the  
image data to parallel operating section 30. Thus,  
20 VSR's 31 can be arranged in one of two manners of 1 x 3  
and 3 x 1 by switching operation of selectors 33. The  
arrangement of VSR's 31 corresponds to that of the local  
image data which can be simultaneously operated during  
one machine cycle.

25 Parallel operating section 30 parallely operates  
the image data from VSR's 31-0, 31-1 and 31-2 and delivers  
the result of operation to a unifying circuit 40.

Unifying circuit 40 unifies the operation data supplied

1 from operation data input port 64 and the output from  
parallel operating section 30. The unified data is  
fetched from operation data output port 65 and stored in  
an output image 2.

5 The main module 10 in accordance with this embodiment permits three image data simultaneously supplied from three VSR's 31 to be processed in parallel in parallel operating section 30.

On the other hand, the most general local  
10 neighboring image operation is an operation of processing  
3 x 3 local neighboring image data as shown in Fig. 2  
in which 9 (nine) image data are required to calculate  
one output image data. Such a 3 x 3 local neighboring  
image operation using the main module 10 can be realized  
15 by the following two systems of:

- (1) time division processing
- (2) provision of more main modules

The system of (1) operates nine local neighboring image data in such a way that three image data are assigned for each of three machine cycles, and unifies the operation results in unifying circuit 40 in three machine cycles. In this system, the input of the image data and the output of the operation results are performed once during three machine cycles. The main module 10 according to this embodiment permits a time division processing of maximum eight machine cycles, and maximum 24 image data can be processed in a time division manner using one main module 10.

1 In the case of an  $n$ -times time division processing,  
line buffer 20 is once during  $n$  machine cycles, and VSR 31  
performs a shift operation once during  $n$  machine cycles  
preserves  $1 \times n$  local neighboring image data during  $n$   
5 machine cycles. VSR 31 further sends  $n$  image data to parallel  
operating section 30 one by one during the  $n$  machine  
cycles. Parallel operating section 30 performs the  
arithmetic between the image data supplied in  $n$  times  
and the  $n$  coefficients data which are produced correspond-  
10 ing to the image data every one machine cycle and supplies  
the operation results (data) to unifying circuit 40 every  
one machine cycle. Unifying circuit 40 unifies the opera-  
tion data supplied in  $n$  times from parallel operating sec-  
tion 30 in  $n$  machine cycles and outputs the unified data  
15 from operation data output port 65. Thus, this system is  
slow in its processing speed but requires only one main  
module and a less amount of hardware.

The system of (2) simultaneously operates the  
3  $\times$  3 local neighboring image data during one machine  
20 cycle using three main modules 10. In this system, three  
image data are operated in each main module and the  
operation data are unified through these three main  
modules. This system requires a more amount of hardware  
than in the system of (1) but can perform the operations  
25 at a high speed.

The main module 10 according to this embodiment  
is also adapted to a multi-mask processing. The  
multi-mask processing with the number of masks  
st at  $m$  is a processing of performing  $m$  local

1 neighboring image operations for one input image 1 and  
unifying m output images 2 thus obtained to provide a  
final result. This multi-mask processing is used for  
an edge enhancement processing, etc. The main module 10  
5 in accordance with this invention permits the processings  
prior to the unification in the multi-mask processing to  
be performed by one image scanning. In the case of the  
multi-mask processing with the number of masks set at m,  
the image data is taken in once in m machine cycles,  
10 and line buffer 20 and VSR 31 also operate once in m  
machine cycles. VSR 31 continues to supply the same  
image data to parallel operation unit 30 in m machine  
cycles. Parallel processing unit 30 produces m coefficient  
patterns for one image data during m machine cycles  
15 and performs the arithmetic thereof with the image data  
every one machine cycle. M operation results are  
sequentially output from operation data output port 65  
during m machine cycles. Further, this multi-mask proces-  
sing can be combined with the time division processing  
20 as mentioned above. In the case of the time-division multi-  
mask processing with the numbers of time-divisions and  
masks being set at t and m, respectively, the image  
data is taken in once in t x m machine cycles and  
m operation results are sequentially output every t  
25 machine cycles.

The above time-division multi-mask processing  
can be realized by externally operating control circuit 21  
to set a control signal MSKTMS from control circuit 21

1 giving (mask number x time division number -1) and  
another control signal TMS from control circuit 21 giving  
(time division number -1).

Fig. 3 illustrates a detailed arrangement of  
5 parallel operating section 30. In this figure, output  
signal lines 300, 301 and 302 from VSR's 31-0, 31-1, and  
31-2 are connected to one input's of three processor  
elements (PE's) 37-0, 37-1 and 37-2, respectively. The  
other inputs thereof are connected with three coefficient  
10 memories 36-0, 36-1 and 36-2 which supply the previously  
stored coefficient data to the corresponding processor  
elements 37 in accordance with address outputs from a  
counter 35. The outputs from operation circuits are  
unified by an arithmetic element 38 and the unified data  
15 are fed to unifying circuit 40 through a signal line 400.

In the case of  $MSKTM S1014 \neq 0$ , the time division  
processing or multi-mask processing is realized, and  
coefficient memories 36 read out the coefficient data at  
addresses which are supplied from a counter 35 and changed  
20 every one machine cycle and supply them to processor  
elements 37.

Fig. 4 illustrates a detailed arrangement of  
unifying circuit 40. The output from parallel operating  
section 30 is fed to a register 41 and a selector 42  
25 through signal line 400. The output from register 41  
is fed to selector 43. Selector 42 selects the operation  
data supplied from operation data input port 64 through a  
signal line 640 and the output from parallel operating

1 section 30 and supplies them to an arithmetic unit 44.

A selector 43 selects an output line 410 from register 41 and an output line 650 from unifying circuit 40 and supplies them to arithmetic unit 44. The output from arithmetic unit 44 is fetched to the external from operation data output port 65 through signal line 650.

Selectors 42 and 43 are controlled by control signals 420 and 430 from a counter 46, respectively. Counter 46 is controlled by a reset signal 450 and a control signal TMS1013 providing (time division number -1) which are supplied from control circuit 21 in such a manner that it is reset when the reset signal is "HIGH" and repeats the count-up from 0 to TMS. With TMS = 0, selectors 42 and 43 always select signal lines 640 and 410, respectively. With TMS  $\neq$  0, selector 42 selects signal line 640 only when the value of counter 46 becomes equal to TMS, and selector 43 selects signal line 410 only when the value of counter 46 becomes zero.

Fig. 5 shows the operation of unifying circuit 40 when TMS = 2. Unifying circuit 40 unifies, during  $(TMS + 1)$  machine cycles,  $(TMS + 1)$  operation data supplied during the cycles and one operation data supplied from data line 640.

In the case as shown in Fig. 5, operation data  $\bar{a}$ ,  $\bar{b}$  and  $\bar{c}$  on data line 400 and an operation data  $\bar{x}$  on data line 640 are unified by the addition thereof. During a first machine cycle, the operation data  $\bar{a}$  and  $\bar{b}$  are added. During a second machine cycle,  $\bar{a} + \bar{b}$  and  $\bar{c}$  are

1 added to provide  $\bar{a} + \bar{b} + \bar{c}$ . During a third machine cycle,  
 $\bar{a} + \bar{b} + \bar{c}$  and  $\bar{l}$  are added. And during the subsequent  
machine cycle, the unifying result  $\bar{a} + \bar{b} + \bar{c} + \bar{l}$  are  
fetched from register 45.

5 Fig. 6 illustrates a detailed arrangement of  
two line buffers 20-0 and 20-1 of Fig. 1 which is const-  
ructed by RAM's.

The arrangement as shown in Fig. 6 is adapted to  
permit the number of delaying steps to be altered, i.e.  
10 to form two line buffers which can delay 8 bit data by  
1024 steps at its maximum or one line buffer which can  
delay 8 bit data by 2048 steps at its maximum.

In Fig. 6, RAM's 241 and 242 have a storage  
capacity of  $8 \times 1024$  bits, respectively. When a clock  
15 signal 2102 is on its high level (hereinafter simply  
referred to as "High"), the 8 bit data of RAM's 241 and 242,  
which correspond to the output of a row address control  
circuit 245, 10 bit row address signal 2103, are read out  
on signal lines 252 and 253, respectively. When clock  
20 signal 2102 is on its low level (hereinafter simply  
referred to as "Low") and an output data 2104 from an  
input/output information control circuit 246 is "Low",  
the 8 bit data on input signal line 540 is stored at the  
address of RAM 241 corresponding to row address signal  
25 2103. On the other hand, when clock signal 2102 is  
"Low" and output data 2104 from input/output information  
control circuit 246 is "High", the 8 bit data on input  
signal line 540 is stored at the address of RAM 242

- 16 -

1 corresponding to row address signal 2103. The respective  
8 bit data on signal lines 252 and 253, read out from  
RAM's 241 and 242 are fed to selectros 243 and 244,  
respectively.

5 Selector 243 selects the data on signal line  
252 when signal line 2104 is "Low" and selects the data  
on signal line 253 when signal line 2104 is "High", and  
delivers them to an output signal line 200. On the  
other hand, selector 244 selects the data on signal line  
10 253 when signal line 2104 is "Low" and selects the data on  
signal line 252 when signal line 2104 is "High", and  
delivers them to an output signal line 201.

Row address control circuit 245 is a 10 bit  
binary counter which is counted up each time control signal  
15 2101 becomes "Low" and clock signal 2102 becomes "High", and  
is initialized to zero when control signal 2101 becomes  
"High". Row address control circuit 245 delivers the  
counted data to a logic circuit 247 as well as RAM's 241  
and 242 as 10 bit row address signal 2103. Logic circuit  
20 247 delivers a "High" level output to a signal line 2106  
when all the 10 bit row address signals are "High" or  
when signal line 2101 is "High". In any other cases,  
logic circuit 247 delivers a "Low" level output.

Input/output information control circuit 246  
25 is a one bit counter (i.e. T flip-flop) which changes the  
status of signal line 2104 from "High" to "Low" or from  
"Low" to "High" each time an initialization signal 2105  
becomes "Low" and signal line 2106 becomes "High". When

- 17 -

- 1 initialization signal 2105 is "High", signal line 2104 is initialized to "Low".

The circuit of Fig. 6 operates as follows..

- It is assumed that as an initial state, control  
5 signal 2101, clock signal 2102 and initialization signal  
2105 are all "Low". Now, after initialization signal 2105  
is changed to "High" and "Low", control signal 2101 is  
made "High". Then, the output signal 2103 from row address  
control circuit 245 is zero and the output signal 2104  
10 from input/output information control circuit 246 is "Low".  
Thereafter, control signal 2101 is changed to "Low", and  
clock signal 2102 is changed from "Low" to "High" and  
further to "Low". At this time, while clock signal 2102  
is "High", the content 8 bits at the 0-th address of RAM  
15 241 is read out onto output signal line 200 through signal  
line 252 and selector 243, and the content 8 bits at the  
0-th address of RAM 242 is fed onto output signal line  
201 through signal line 253 and selector 244. When clock  
signal 2102 becomes "Low", the 8 bit data on input signal  
20  
line 540 is stored or written at the 0-th address of  
RAM 241. Then, the contents of RAM 242 don't vary at  
any row address.

- Thereafter, each time clock signal 2102 is  
changed from "Low" to "High" and further to "Low", the  
25 row address of read-out and write-in is increased  
one by one, but in the same manner as mentioned above,  
the data read out from RAM 241 is fed to output signal  
line 200, the data read out from RAM 242 is fed to output

1 signal line 201 and the 8 bit data on input signal line  
540 is stored at the address of RAM 241 corresponding to  
the present row address signal.

It is now assumed that control signal line 2101  
5 has become "High" before row address signal line 2103  
reaches 1023. Then, signal line 2106 is changed from  
"Low" to "High". The level change of signal line 2106  
changes the state of input/output information control  
circuit 246, making signal line 2104 "High". Thus, the  
10 selection states in selectors 243 and 244 are switched so  
that signal line 252 is connected with output signal line  
201 and signal line 253 is connected with output signal  
line 200. A writable RAM is shifted from RAM 241 to RAM  
242 so that RAM 241 is not writable. The output signal  
15 (row address signal line) 2103 from row address control  
circuit 245 is initialized to zero.

Thereafter, if after control signal 2101 is  
made "High", clock signal 2101 is pulsed, the row address  
signal 2103 is increased from zero one by one. When clock  
20 signal 2102 is "High", in accordance with the present  
row address signal, the data read out from RAM 241 is fed  
to output signal line 201 through signal line 252 and  
selector 244 while the data read out from RAM 242 is  
fed to output signal line 200 through signal line 253  
25 and selector 243. When clock signal 2102 is "Low", the  
data on input signal line 540 is stored at the address  
of RAM 242 corresponding to the present row address.

1 signal 2103.

The relation between the arrangement of Fig. 6 and the main module of Fig. 1 will be explained below.

It is now assumed that the contents of RAM's 241  
5 and 242 in Fig. 6 are undefined as their initial state,  
and the number of the pixels of input image 1 is 100 in  
its horizontal direction.

In Fig. 6, the image data of input image 1 are  
input from input signal line 540 and first written into  
10 RAM 241. Namely, 100 image data belonging to the first  
raster are sequentially written at the row addresses 0 to  
99 of RAM 241. Then, the undefind data are read out from  
RAM's 241 and 242. Next, 100 image data (pixel data)  
belonging to the second raster written at the row addresses  
15 0 to 99 of RAM 242. Then, the first raster image data are  
read out from RAM 241 while the undefined data are read  
out from RAM 242.

100 image data belonging to the third raster are  
written at the row addresses 0 to 99 of RAM 241. Then, the  
20 first raster image data are read out from RAM 241 to output  
signal line 200 through signal line 252 and selector  
243 while the second raster image data are read out from  
RAM 242 to output signal line through signal line 253 and  
selector 244. Moreover, 100 image data belonging to the  
25 fourth raster are written at the row addresses 0 to 99 of  
RAM 242. Then, the second raster image data are read out  
from RAM 242 to output signal line 200 through signal  
line 253 and selector 243 while the third raster image

1 data are read out from RAM 241 to output signal line 201 through signal line 252 and selector 244.

Namely, when the third raster image data are input, RAM's 241 and 242 output the data as line buffers 20-1 and 20-0, respectively. On the other hand, when the fourth raster image data are input, RAM's 241 and 242 output the data as line buffers 20-0 and 20-1, respectively.

Generally, the odd-number-th raster image data are written in RAM 241 whereas the even-number-th raster image data are written in RAM 242. The raster image data read out frfom RAM's 241 and 242 are fed to the output signal lines 200 and 201 in such a way that the smaller-number-th raster image data are fed to output signal line 200 whereas the larger-number-th raster image data are 15 fed to output signal line 201.

When the number of delayed steps exceeds 1024, i.e., the row address number reaches 1023, signal line 2106 becomes "High" so that the output signal 2104 from input/output information control circuit 246 is changed 20 in its state. Thus, the writing into RAM so far written is ceased and the writing into the other RAM is instructed (this writing is started from the 0-th address thereof).

Also, when the signal 2104 is changed in its state the connection states between RAM's 241 and 242 and output signal lines 200 and 201 are switched. Accordingly, the arrangement shown in Fig. 6 can be used as a 8-bit 2048 step line buffer having input signal 540 and output signal 200.

1       The line buffers as described above were con-  
structed by RAM's which are suitable for LSI, but it is  
needless to say that they can be also constructed by  
shift registers.

5       Fig. 7 illustrates one detailed arrangement of  
VSR 31-0.

VSR 31-0 consists of a read-out signal control section 18 for performing a shifting operation 18, an output selection control section 19 and variable-step-number shift register cells (vsr) 100. The image data raster-scanned from input image 1 are input to vsr's 100 from input data line 540 as 8 bit data. The output from vsr's 100 is fed to parallel operating section 30 and selector 33-0. Each vsr 100 performs the input and shift of the data by the read-out and write-in of the data during one machine cycle. In VSR 31-0 shown in Fig. 6, each vsr 100 performs the write-in and read-out of the data in accordance with a write enable signal  $\phi_1$  1001 in synchronism with a clock and a read enable signal  $\phi_2'$  1006 supplied from read-out signal control section 18. The output selection signal 1015 supplied from output selection control section 19 is fed to a clock gate 1500 (Fig. 8) constituting a selector, which is embedded in vsr 100. When the data in the vsr 100 in which output selection signal 1015 becomes "High" is fed to output data line 300 as an output from selector.

Read-out signal control section 18 in Fig. 7 takes in (inputs) a read enable signal 1002 in synchronism

1 with a clock and outputs a read enable signal 1006 which  
intermittently becomes "High".

The read-out signal control section for performing a shifting operation consists of a 4-bit down counter 104, 5 a half register (HR) 102 and a delay circuit 101. 4-bit down counter 104 is one which is counted down each clock. When a reset signal 1000 becomes "High" or a counter output 1004 becomes zero, a load signal 1024 becomes "High", and during the subsequent machine cycle, 10 4 bit data MSKTMS 1014 is loaded into the 4 bit down counter 104 from control circuit 21. HR 102 and delay circuit 101 generate a read control signal 1005 having delayed load signal 1024 by a half machine cycle so that read enable signal 1006 can be "High" during a machine 15 cycle subsequent to the machine cycle during which load signal 1024 has become "High".

Output selection control section 19 consists of a 3-bit up counter 103 and a decoder 105 and switches output selection signal 1015 every one machine cycle. 20 3-bit up-counter 103 is one which is counted up every clock. When reset signal 1000 becomes "High", or the counter output coincides with the 3 bit data TMS 1013 supplied from control circuit 21, a reset signal 1023 becomes "High", and during the subsequent machine cycle, 25 the 3-bit up-counter 103 is reset. The output 1003 from 3-bit up-counter 103 is decoded by decoder 105 and becomes output selection signal 1015.

It should be noted that the step number of a

1 shift register 31-0 can be altered by the TMS signal and  
is  $(TMS + 1)$  when a predetermined value is set at TMS.

Fig. 8 shows the detail of vsr 100 which is  
1 bit one step shift register. The vsr 100 performs the  
5 data shifting by reading out the data in vsr 100 to an  
output line 1011 during the former half of one machine  
cycle and by writing the data from an input line 1010 into  
vsr 100. Input line 1010 is connected with input data  
line 540 at the first step vsr 100, and is connected with  
10 the output line 1011 of the previous step vsr 100 at the  
vsr's other than the first step vsr. The data in vsr 100  
is fed to output data line 300 when output selection  
signal 1015 is "High".

Fig. 9 shows the operation of VSR 31-0 when  
15 MSKTMS = 5 and TMS = 2, and Fig. 10 is a timing chart  
thereof. VSR 31-0 inputs and shifts the data once in  
 $(MSKTMS + 1)$  machine cycles and sequentially outputs the  
data in VSR 31-0 during  $(TMS + 1)$  machine cycles. In the  
case of Fig. 9 the data is input and shifted once in 6  
20 machine cycles in VSR 30-0 and the data stored in  
VSR 31-0 are sequentially output during 3 machine cycles.

Symbols ①, ...., ⑨ shown in Figs. 9 and 10  
designate a first, .... a ninth machine cycle, respec-  
tively. The first machine cycle corresponds to the  
25 state where data A and B are stored in VSR 31-0 and data C  
has reached input data line 540. Then, when reset signal  
1000 is made "High", the 4-bit down counter and 3-bit up  
counter are initialized, respectively. Also, since the

- 24 -

1 read-out control signal (RDEN) 1005 is "High" from the  
first machine cycle to the second machine cycle, read  
enable signal  $\phi_2'$  1006 is "High" in the second machine  
cycle. Thus, from the first machine cycle to the second  
5 machine cycle, the data C is input in VSR 31-0 and the  
data A and B are shifted rightwards by one step.

During the second machine cycle to the seventh  
machine cycle, 3-bit up counter 103 continues to count  
like 0, 1, 2, 0, 1, 2, so that the data A, B and C stored  
10 in VSR 31-0 are output in the order of C, B, A, C, B, A.

At the seventh machine cycle, the subsequent  
data D reaches input data line 540. Then, 4 bit down  
counter 104 output zero and the read-out control signal  
(RDEN) 1005 is high from the seventh machine cycle to  
15 the eighth machine cycle so that as in the first to  
second machine cycles, from the seventh machine cycle to  
the eighth machine cycle, a data D is input in VSR 31-0  
and futher, the data B and C are shifted rightwards by one  
step, and the data A is abandoned. Thereafter, during six  
20 machine cycles from the eighth machine cycle, the data B,  
C and D are preserved, and sequentially read out from  
VSR 31-0 in the order of D, C, B, D, C, B.

According to one arrangement of VSR 31-0 as shown  
in Fig. 7, local neighboring (Kernel) image can be cut  
25 out from the raster scanned input image 1 intermittently  
supplied and preserved in the step-number-variable shift  
register (VSR). And also, the preserved local neighboring  
image data can be supplied to an operation circuit in a

1 time division manner.

Fig. 11 shows another arrangement of VSR 31-0.

In this arrangement VSR 31-0 consists of a write signal control section 28 for performing a shifting operation, 5 an output selection control section 19 and step-number-variable shift register cells (vsr) 100. In this arrangement, each vsr 100 performs the write-in and read-out of the data in accordance with a write enable signal  $\phi_1'$  1106 output from write signal control section 28 and 10 the read enable signal  $\phi_2$  1002 in synchronism with a clock, respectively.

Write signal control section 28 corresponds to read signal control section 18 of Fig. 7 which performs a shifting operation, and takes in (inputs) write enable 15 signal 1001 in synchronism with a clock and outputs write enable signal 1106 which intermittently becomes "High". In the arrangement of Fig. 11, write enable control section 28 consists of the 4-bit down counter 104 only, and the load signal 1024 from 4-bit down counter 104 20 is employed as a write control signal as it is.

Fig. 12 shows a timing chart of the operation of VSR 31-0 when MSKTMS = 5 and TMS = 2 in this arrangement. The operation in this arrangement is the same as that of the arrangement of Fig. 9. In the timing chart of Fig. 12, 25 in the first and seventh machine cycles, the load signal 1024 from 4-bit down-counter 104 becomes "High" and write enable signal  $\phi_1'$  1106 becomes also "High". Thus, the data C is input in VSR 31-0 from the first machine cycle

1 to the second machine cycle and also the data A and B  
are shifted rightwards by one step, respectively; the data  
D is input in VSR 31-0 from the seventh machine cycle to  
the eighth machine cycle and also the data B and C are  
5 shifted rightwards by one step, respectively.

According to this arrangement of VSR 31-0, the  
same effect as in the previous arrangement shown in Fig. 7  
can be attained by less amount of hardware than the latter. ...

Fig. 13 shows still another arrangement of VSR  
10 31-0. In this arrangement, VSR 31-0 consists of a write  
control section 28 for performing a shifting operation,  
an output selection control section 29 and step-number-  
variable shift register cells (vsr) 100.

The output selection control section 29 in this  
15 arrangement consists of a 3-bit up counter, an RAM 203  
and a decoder 105. The counter output line 1003 constitutes  
an address line for RAM 203, and the contents at the address,  
specified by counter output line are fetched from the RAM  
output line 2003, fed to decoder 105, converted into an  
20 output selection signal 1015 which is supplied to vsr 100.

There is shown in Fig. 14 the operation of  
VSR 31-0 when 0, 2 and 4 have previously stored at the  
addresses of RAM 203, and shown in Fig. 15 its timing  
chart. In Figs. 14 and 15, MSKTMS and TMS are set at 5  
25 and 2, respectively.

The input and shifting of the data are performed  
from the first machine cycle to the second machine cycle,  
and thereafter during the second to seventh machine cycles,

- 1 the contents A, C and E of vsr 100, which are specified every clock by the RAM output line, are read out in the order of E, C, A, E, C, A. Further, the input and shifting of the data are performed from the seventh machine  
5 to the eighth machine cycle, and thereafter the contents B, D and F of vsr 100 are read out in the order of F, D, B, F, D, B in accordance with the RAM output 2003.

In accordance with this arrangement of VSR 31-0, by previously setting the data in RAM, any data stored in  
10 the variable step shift register can be read out in any order so that scattered local neighboring images can be efficiently processed in a time division manner.

Fig. 16 shows an arrangement for performing a  $3 \times 3$  local neighboring image data operation every three machine cycles in a time division manner using one main module 10 shown in Fig. 1. In this arrangement, each of VSR's 31-0, 31-1 and 31-2 preserves  $1 \times 3$  local neighboring image data in three time division processings, and these VSR's are arranged in a manner of  $3 \times 1$  by switching  
20 selectors 33-0 and 33-1. Thus, as a whole,  $3 \times 3$  local neighboring image data are preserved in these VSR's. This arrangement is implemented in such a way that the control circuit 21 is externally operated so that MSKTMS and TMS are set at 2 and selectors 33-0 and 33-1 can select data  
25 lines 200 and 201, respectively. It should be noted that only one main module 10 is used so that the data is not required to be sent to image data output port 55 through

1 selector 70.

Input image 1 is raster-scanned once in three machine cycles and is fed to VSR 31-0 and line buffer 20-0 through image data input port 54 one image data during every three machine cycles. Line buffer 20-0 delays the image data by the time required to scan one line of the input image 1. The output from line buffer 20-0 is fed to VSR 31-1 and line buffer 20-1. Line buffer 20-1, like line buffer 20-0, delays the image data by the time required to scan one line of the input image 1 and supplies the delayed image data to VSR 31-2. VSR's 31-0, 31-1 and 31-2 take in one image data once in three machine cycles and shift them, respectively. Then, nine local neighboring image data A, B, C, D, E, F, G, H and I required to calculate one image data of output image 2 are preserved inside VSR's 31-0, 31-1, 31-2 during the three machine cycles.

The local neighboring image data preserved in VSR's 31-0, 31-1 and 31-2 are read out in a time division manner during the three machine cycles, and fed to processor elements (PE's) 37-0, 37-1 and 37-2 (Fig. 3) in parallel operating section 30. In PE's 37-0, 37-1 and 37-2, arithmetics are performed between the image data supplied from VSR's 31-0, 31-1 and 31-2 and the coefficient data supplied from the corresponding coefficient memories 36-0, 36-1 and 36-2. The operation results thus obtained are unified in arithmetic element 38. In this way, the operation results of the image data constituting a local neighboring image are fetched from arithmetic

1 element 38 in their three parts divided, unified in unifying  
circuit 40 during three machine cycles and output from  
main module 10 as output image 2.

0189943

Fig. 17 shows an arrangement for performing a  
5  $3 \times 3$  local neighboring image data operation every one  
machine cycle using three main modules 10 one of which is  
shown in Fig. 1. In this arrangement, three VSR's 31 are  
arranged in a manner of  $1 \times 3$  by switching selectors 33-0  
and 33-1. And also the image data delayed from the  
10 input image data by one line of the data by line buffer  
20-0 is output from image data output port 55 by switching  
selector 70 so that three main modules 10 are arranged in  
a manner of  $3 \times 1$ . Thus, as a whole,  $3 \times 3$  local neighbor-  
ing image data are simultaneously fetched. This arrange-  
15 ment is implemented in such a way that the control circuit  
21 is externally operated so that MSKTMS and TMS are set  
at 0 and selectors 33-0, 33-1 and 70 can select data lines  
300, 301 and 200, respectively. It should be noted that  
each main module 10 selects the output from line buffer  
20 20-0 by selector 70 and outputs the data on data line  
200 from image data output port 55.

Input image 1 is raster-scanned every one machine  
cycle. The input image data read out by the raster scanning  
are supplied to the image data input port 54 of a main  
25 module 10A. The image data delayed by one line of the  
data by line buffer 20-0 in main module 10A is output from  
the image data output port 55 of main module 10A, and fed  
to the image data input port 54 of a main module 10B.

1 In the same manner, the image data delayed by further one  
line of the data is delivered from main module 10B to  
main module 10C. The arithmetic result output from the  
operation data output port 55 of main module 10A is applied  
5 to the arithmetic data input port 64 of main module 10B  
and is unified with the operation result of parallel  
operation section 30 by unifying circuit 40 in main module  
10B. In the same way, the operation result is delivered  
from main module 10B to main module 10C and is unified  
10 with the operation result of parallel processor 30 in  
main module 10C, and the unified data is output as an  
output image data every one machine cycle from the operation  
data output port 65.

Inside the main modules 10A, 10B and 10C, the  
15 respective image data are input in VSR 31-0, and sequentially  
shifted to VSR's 31-1 and 31-2. Thus,  $3 \times 3$  local neighbor-  
ing data A, B, C, D, E, F, G, H and I are simultaneously  
preserved in the total nine VSR's 31 in the three main  
modules 10. The arithmetics thereof are performed by the  
20 total three parallel processor sections 30 during one  
machine cycle.

Fig. 18 shows an arrangement for performing a  
7 x 7 local neighboring operation every seven machine  
cycles using three main modules connected in the same way  
25 as in Fig. 17. In this arrangement, each of VSR's 31-0,  
31-2 preserves  $1 \times 7$  local neighboring image data in  
seven time division processings, and these three VSR's  
are arranged in a manner of  $3 \times 1$  by switching selectors

1 33-0 and 33-1. Thus, 3 x 7 local neighboring data are  
preserved in these VSR's for one main module. The image  
data delayed from the input image data by two lines of  
the data is output from image data output port 55 by  
5 switching the selector 70 so that these three main modules  
10 are arranged in a manner of 3 x 1. However, the size  
of the local neighboring image data to be fetched is not  
9 x 7 but 7 x 7. This is because one line of the image  
data is repeated in the adjacent main modules. The  
10 repetitions can be obviated by providing three line buffers  
20 in one main module 10.

This arrangement is implemented in such a way  
that the control circuit 21 is externally operated so that  
MSKTMS and TMS are set at 6 and selectors 33-0, 33-1 and  
15 70 can select data lines 200, 201 and 201, respectively.  
It should be noted that each main module 10 selects the  
output from line buffer 20-1 by selector 70 and outputs  
the data on data line 201 from image data output port 55.

Input image 1 is raster-scanned once in seven  
20 machine cycles and is fed to the image data input port 54  
of main module one pixel during every seven machine cycles.  
The image data delayed from the input image by two lines  
of the data by line buffers 20-0 and 20-1 in main module  
10A is output from image data output port 55 thereof and  
25 fed to the image data input port 54 of main module 10B.  
In the same way, the image data delayed by further two  
lines of the data is delivered from main module 10B to  
main module 10C. The operation result output from the

1 operation data output port 65 of main module 10A is fed  
to the operation data input port 64 of main module 10B  
and unified with the operation result of the parallel  
processing section 30 by unifying circuit 40 inside main  
5 module 10B. In the same way, the operation result is  
delivered from main module 10B to main module 10C and  
unified with the operation result of the parallel proces-  
sing unit 30 in main module 10C, and the unified data is  
output from operation data output port 65 as an output image  
10 data every seven machine cycles.

Inside main module 10A,  $3 \times 7$  local neighboring  
image data are preserved in VSR's 31-0, 31-1 and 31-2  
thereof. Inside main modules 10B and 10C,  $2 \times 7$  (but not  
 $3 \times 7$ ) local image data are preserved as effective image  
15 data in VSR's 31-1 and 31-2 thereof, respectively, during  
seven machine cycles since the image data preserved in  
the respective VSR's 31-0 are the same as the image data  
preserved in VSR's 31-2 of the respective previous step  
main modules. Thus,  $7 \times 7$  local neighboring image data  
20 are preserved during the seven machine cycles in the total  
seven VSR's 31 in the three main modules 10A, 10B and 10C.  
The  $7 \times 7$  local neighboring image data are read out  
during the seven machine cycles in a time division manner,  
and operated by the total three parallel processing sections  
25 30 every seven machine cycles.

Incidentally, by setting MSKTMS and TMS at 4 in  
the above arrangement, the arithmetic of  $5 \times 5$  local  
neighboring image data can be performed every five machine

1 cycles. In this case, it should be noted that the selection of the outputs from the line buffers 20 by the selector 70 in each main module 10 is controlled by control circuit 21.

5 Accordingly, in accordance with this embodiment of this invention as mentioned above, the arithmetic of 3 x 3 local neighboring image data can be performed using one main module 10 every three machine cycles. And also, three kinds of arithmetic of 3 x 3, 5 x 5 and 7 x 7  
10 local neighboring image data can be performed using three main modules 10, without changing the manner of connecting them through the operation of control circuit 21.

Fig. 19 shows another arrangement of the main module of the parallel image processor according to this invention. In the main module shown in Fig. 19, four VSR's 31, four arithmetic circuits (PE) inside parallel processing unit 30, three selectors 33 and three line buffers 20 are used, i.e. one more element than in the main module 10 shown in Fig. 1 is used for these components.  
20 The selector 33-1 is a selector of 3- to -1 which selects one of three data lines 200, 201 and 301, and so the arrangement of VSR 31 can be changed in three manners of 1 x 4, 2 x 2 and 4 x 1 by switching the selector 33-1. The selector 70 is a selector of 4- to -1 which selects  
25 one of four data lines 540, 200, 201 and 202, and so by switching the selector 70, one of the image data delayed from the input image data by zero, one, two, and

1 three lines can be selected and output from image data  
output port 55.

Fig. 20 shows an arrangement for performing a  
4 x 4 local neighboring image data operation every four  
5 machine cycles in a time division manner using one main  
module 10 shown in Fig. 19. In this arrangement, the  
circuits other than line buffers 20 and VSR's 31 are  
omitted for brevity's sake. And also in this arrangement,  
one VSR 31 preserves 1 x 4 local neighboring image data in  
10 four time division processings, and four VSR's 31 are  
arranged in a manner of 4 x 1 by switching selectors 33.  
Thus, as a whole, 4 x 4 local neighboring image data are  
preserved in these VSR's. This arrangement is implemented  
in such a way that the control circuit 21 is externally  
15 operated so that MSKTMS and TMS are set at 2 and selectors  
33-0, 33-1 and 33-2 can select data lines 200, 201 and 202,  
respectively.

Fig. 21 shows an arrangement for performing a  
4 x 4 local neighboring image data operation every one  
20 machine cycle using four main modules one of which is shown  
in Fig. 19. In this arrangement, four VSR's 31 are arranged  
in a manner of 1 x 4 by switching selectors 33 shown  
Fig. 19. And also in each module the image data delayed  
from the input image data by one line of the data is  
25 output from the image data output port by switching the  
selector 70 so that four modules 10 are arranged in a  
manner of 4 x 1. Thus, as a whole, 4 x 4 local neighboring  
image data can be simultaneously fetched. This arrangement

1 is implemented in such a way that the control circuit 21  
is externally operated so that MSKTMS and TMS are set at  
0 and selectors 33-0, 33-1, 33-2 and 70 can select data  
lines 300, 301, 302 and 200, respectively. It should be  
5 noted that selector 70 in Fig. 19 serves to select line  
buffer 20-0 and output the data on data line 200 from  
image data output port 55.

In the arrangement of Fig. 21, an input image  
data is applied to image data input port 54 of main module  
10A. The image data delayed from the input image data by  
one line of the data is output from image data output port  
55 of main module 10A and applied to image data input port  
54 of main module 10B. In the same way, the image data  
is delivered from main module 10B to main module 10C and  
15 from main module 10C to main module 10D. Moreover, the  
operation result output from operation data output port 65  
of main module 10A is applied to the operation data input  
port 64 of main module 10B. In the same way, the operation  
result is delivered from main module 10B to main module  
20 10C and from main module 10C to main module 10D. Finally,  
an output image data is output from the operation data  
output port 65 of main module 10D every one machine cycle.

Fig. 22 shows an arrangement for performing an  
8 x 8 local neighboring image data operation every four  
25 machine cycles using four main modules 10 connected in the  
same manner as in Fig. 21. In this arrangement, each VSR  
preserves 1 x 4 local neighboring data in four time  
division processings, and these four VSR's are arranged

1 in a manner of  $2 \times 2$  by switching selectors 33 in Fig. 19.  
Thus,  $2 \times 8$  local neighboring data are preserved for one  
main module. The image data delayed from the input imag  
data by two lines of the data is output from image data  
5 output port 55 by switching the selector 70 so that these  
four main modules 10 are arranged in a manner of  $4 \times 1$ .  
As a whole,  $8 \times 8$  local neighboring image data are preserved  
in this arrangement. This arrangement is implemented in  
such a way that the control circuit 21 is externally operated  
10 so that MSKTMS and TMS are set at 3 and selectors 33-0,  
33-1, 33-2 and 70 can select data lines 300, 200, 302 and  
201, respectively. It should be noted that the main  
module 10 in Fig. 19 selects the output from line buffer  
20-1 by means of selector 70 and outputs the data on data  
15 line 201 through image data output port 55.

In the arrangement of Fig. 22, input image 1 is  
raster-scanned once in four machine cycles and is fed to  
the image data input port 54 of main module 10A one  
pixel during every four machine cycles. The image data  
20 delayed from the input image by two lines of the data by  
line buffers 20-0 and 20-1 in main module 10A is output  
from image data output port 55 thereof and fed to the  
image data input port 54 of main module 10B. In the same  
way, the image data delayed by further two lines of the  
25 data is delivered from main module 10B to main module 10C,  
moreover from main module 10C to main module 10D. The  
operation result is delivered from main module 10D as an  
output image data every four machine cycles.

1 In accordance with this embodiment of this invention as mentioned above, the arithmetic of  $4 \times 4$  local neighboring image data can be performed using one main module 10 every four machine cycles. And also, several  
5 kinds of arithmetic of local neighboring image data  $\times 4 \times 4$ ,  $8 \times 8$ , etc. can be performed using plural main modules 10, without changing the manner of connecting them through the external operation of control circuit 21.

Fig. 23 shows still another arrangement of the  
10 main module 10 of the parallel image processor, which includes three line buffers 20, nine VSR's and also nine processor elements (PE) 37 in parallel processor section 30.

Fig. 24 shows an arrangement for performing a  $3 \times 3$  local neighboring image data operation every one machine  
15 cycle using one main module 10. Fig. 25 shows an arrangement for performing a  $3 \times 9$  local neighboring image data operation every three machine cycles in a time division manner by means of the same hardware as in Fig. 24.

Fig. 26 shows an arrangement for performing a  
20  $9 \times 9$  local neighboring image data operation every one machine cycle using nine main modules 10.

An image data f is applied to the image data input 54 of main module 10A. It is also delayed by three pixels by a shift register 3 and applied to the image data  
25 input port 54 of main module 10B. The delayed image data is further delayed by three pixels by a shift register 4 and applied to the image data input port 54 of main module 10C. The image data delayed from the input image data f by three

1 lines of the data, output from the respective image data  
output ports 55 of main modules 10A, 10B and 10C, are  
applied to the image data input port 54 of main modules  
10D, 10E and 10F, respectively. The image data delayed  
5 from the input image data f by six lines of the data,  
output from the respective image data output ports 55 of  
main modules 10D, 10E and 10F, are applied to the image  
data input port 54 of main modules 10G, 10H and 10I,  
respectively. Further, the operation result output from  
10 the operation data output port 65 of main module 10A is  
applied to the operation data input port 64 of main  
module 10D. In the same way, the operation result is  
delivered from main module 10D to main module 10G, from  
10G to 10B, and further to 10E, 10H, 10C, 10F and 10I.  
15 Finally, an output image data g is output from the opera-  
tion data output port 65 of main module 10I every one  
machine cycle.

Fig. 27 shows an arrangement for performing a  
9 x 9 local neighboring image data operation every three  
20 machine cycles in a time division manner using three main  
modules 10. In this arrangement, the same 9 x 9 local  
neighboring image data operation as in the arrangement of  
Fig. 26 can be realized by means of the amount of hardware  
which is 1/3 of that of the latter.

25 In accordance with this embodiment of this  
invention, the arithmetic of 3 x 3 local neighboring  
image data can be performed using one main module 10  
every one machine cycle. And also, by using plural main

1 modules 10, the arithmetic employing a larger local image  
region, e.g. zero-crossing operation, pattern matching,  
etc. can be performed every one machine cycle. Further,  
the arithmetic employing a larger local image region can be  
5 performed by a smaller amount of hardware in a time division  
processing.

Thus, several embodiments of this invention have  
been explained above. It should be noted in each embodiment  
that the respective numbers of line buffers 20, VSR's 31,  
10 and processor elements (PE) 37 in parallel processor section  
30 can be determined as required in relation to the degree  
of integration of LSI. If with  $m$  or  $m-1$  line buffers  
and  $m$  arithmetic circuits being provided in the main  
module, such a single main module is used for the time  
15 division processing in  $n$  cycles, the processing of  $m \times n$   
local neighboring image data can be performed in  $m$  machine  
cycle. Or if above-mentioned  $n$  main modules are arranged  
for the parallel processing of the respective line buffer  
outputs selected by selector 70 one for each main module,  
20 the processing of  $n \times m$  local neighboring image data can  
be performed in one machine cycle.

Further, only if selectors 70 and 31 are switched  
with the  $n$  main modules provided, the time division proces-  
sing of  $(m \times n)$  rows  $\times t$  columns can be performed at the  
25 maximum. (In this case,  $t$  machine cycles and an arrangement  
of VSR's of  $t$  steps are required).

A wider varieties of parallel processings can be  
performed at a high speed by providing  $m \times n$  arithmetic

1 circuits 37.

Thus, the parallel image processor according to this invention can be flexibly adapted to the conflict d needs of users that a large amount of image data is desired 5 to be processed at a high speed, or by a small amount of hardware although more time may be taken.

- (1) In accordance with this invention, the local neighboring image region to be subjected to a local neighboring image data processing can be easily expanded 10 without the needs of externally equipped circuits and complicated controls.
- (2) In accordance with this invention, the local neighboring image operations for various local neighboring image regions can be realized by altering the construction 15 of each of main modules through the operation of a control circuit provided therein and without altering the connecting manner of the main modules.
- (3) In accordance with this invention, the amount of hardware used can be greatly reduced by LSI'ing each main 20 module.

CLAIMS:

1. A parallel image processor consisting of at least one main module (10) for performing a parallel operation of local neighboring image data on the basis of input image data externally taken in, comprising:

at least one data memory (20-i) for delaying the input image data by one line of the data in order; and

an output port (55) for fetching the delayed image data and feeding it to the other main module as an input image data for connection between the main modules.

2. A parallel image processor consisting of at least one main module for performing a parallel operation of local neighboring image data on the basis of externally input image data, comprising:

at least one data memory (20-i) for delaying the input image data by one line of the data in order;

a selector (70) for selectively switching the externally input image data and the delayed image data; and

an output port (55) for fetching the selected image data and feeding it to the other main module as an input image data for connection between the main modules.

3. A parallel image processor as claimed in Claim 1 or 2, wherein said main module comprises:

at least one sequential memory means (31-i) for storing local neighboring image data sequentially cut out from the input image data;

a parallel operation unit (30) for performing a parallel operation of the local image data, and

unifying means (40) for unifying the results of the parallel operation and outputting the unified result.

4. A parallel image processor as claimed in Claim 3, wherein said parallel operation unit (30) consists of a plurality of processor elements (37-i) and a plurality of coefficient memories for storing coefficient data corresponding to the processor elements, said sequential memory means (31-i) consists of a plurality of memory elements corresponding to the processor elements, and the local image data cut out from the sequential memory means and the coefficient data are operated in parallel in the corresponding processor elements.

5. A parallel image processor as claimed in Claim 3, wherein said sequential memory means is constructed by shift registers.

6. A parallel image processor as claimed in Claim 1 or 2, wherein said data memory is constructed by RAM's or by shift registers.

7. A parallel image processor consisting of at least one main module (10) for performing the parallel operation of  $m \times n$  ( $m, n$  : integer) local neighboring image data cut out from an externally input image data, comprising:

at least  $(m - 1)$  line buffer (20-i) for delaying said input image data in order by one line of the data;

sequential memory means (30-i) consisting of  $m \times n$  steps, for storing the local neighboring image data sequentially cut out from the input image data or the delay image data;

a parallel operation section (30) including  $m$  processor elements for performing the parallel operation of the local neighboring image data; and

a unifying circuit (40) for unifying the results of the parallel operation in  $n$  machine cycles and outputting the unified result.

8. A parallel image processor as claimed in Claim 7, wherein said parallel operation section (30) comprises  $m$  coefficient memories (36-i) for coefficient data corresponding to the processor elements (37), respectively.

9. A parallel image processor consisting of at least one main module (10) for performing a parallel operation of local neighboring image data cut out from an externally input image data, comprising:

- 44 -

an input port (54) for externally taking in  
an input image data;

at least  $(m - 1)$  line buffers for delaying the  
input image data in order by one line of the data;

m sequential memory means having a variable  
number of steps, for storing the local neighboring  
image data sequentially cut out from the input image  
data or the delayed image data;

first  $(m - 1)$  selectors (33-i) for selec-  
tively switching the outputs from the line buffers  
and the outputs from the sequential memory means  
and supplying them to a succeeding sequential memory  
means;

a parallel operation section (30) comprising  
m processor elements (37) for performing the parallel  
operation of the local neighboring image data output  
from the corresponding sequential memory means;

unifying means (40) for unifying the results  
of the parallel operation and outputting the unified  
result;

a second selector (70) for selectively  
switching the externally input image data and the  
image data delayed by the line buffers;

an output port (55) for fetching the  
image data selected by the second selector; and

a control circuit (21) for supplying control  
signals to said first and second selectors.

10. A parallel image processor as claimed in

Claim 9, wherein said parallel operation section (30) comprises m coefficient memories for coefficient data corresponding to the processor elements (37), respectively, and the local neighboring image data cut out from the sequential memory means and the coefficient data are operated in parallel in the corresponding processor elements.

11. A parallel image processor as claimed in Claim 9, wherein said sequential memory means intermittently performs a shift operation for clock signals and reads out the memory content each clock signal.

12. A parallel image processor as claimed in Claim 9, wherein said line buffers comprise information memory sections (241, 242) permitting at least one bit to be simultaneously read out and written in, and a row address control section (245) for controlling the row addresses of said information memory section, the read-out, write-in starting and ending row addresses of said information memory section are determined in accordance with the control signals supplied to the row address control section so as to make variable the number of delay steps.

13. A parallel image processor as claimed in Claim 9, further comprising an operation data input port (64) for taking in an operation result externally provided which is unified with the operation result obtained from the parallel operation section in said unifying means (40), and an operation data output

port (65) for externally outputting the unified result.

14. A parallel image processor as claimed in Claim 9, wherein each of said sequential memory means in said main module is constructed by n steps and said first selectors are switched to the outputs from the line buffers, so that  $m \times n$  local neighboring image data are processed in a time division manner in n machine cycles.

15. A parallel image processor as claimed in Claim 9, wherein consisting of n main modules  $n \times m$  local neighboring image data are processed during one machine cycle in such a state that the image data output ports (55) are connected with the input ports of succeeding main modules to provide the sequential memory means each having one step, the first selectors (33-i) are switched to the outputs from the sequential memory means, and the second selector (70) is switched to the output from the line buffer delayed by one line of the data.

16. A parallel image processor as claimed in Claim 9, an arrangement wherein consisting of n main modules maximum  $(m \times n) \times t$  local image neighboring image data are processed in a time division manner in t machine cycles in such a state that the image data output ports are connected with the input ports of secceeding main modules to provide the sequential memory means each having t steps, the first selectors (33-i) are switch d to the outputs from the line

buffer, and the second selector (70) is switched to either one of the outputs from the line buffers.

FIG. 1



2125

0189943

FIG. 2



0189943

3125

FIG. 3



FIG. 4



4125

0189943

FIG. 5



5/25

0189943

FIG. 6



6/25

0189943

FIG. 7



7/25

0189943

FIG. 8



8 / 25

0189943

FIG. 9



9/25

0189943

FIG. 10



10/25

0189943

FIG. II



11/25

0189943

FIG. 12



12/25

0189943

FIG. 13



13/25

0189943

FIG. 14.



0189943

14/25

FIG. 15



15/25

0189943

FIG. 16



16/25

0189943

FIG. 17



FIG. 18

0189943

## INPUT IMAGE



18/25

FIG. 19

0189943



0189943

19/25

FIG. 20



20/25

0189943

FIG. 21



21/25

0189943

FIG. 22



22/25

0189943

FIG. 23



0189943

23/25

FIG. 24



0189943

24 / 25

FIG. 25



FIG. 26

0189943



FIG. 27





Europäisches Patentamt  
European Patent Office  
Office européen des brevets

(11) Publication number:

0 189 943  
A3

(12)

## EUROPEAN PATENT APPLICATION

(21) Application number: 86101338.1

(51) Int. Cl.<sup>3</sup>: G 06 F 15/68

(22) Date of filing: 31.01.86

(30) Priority: 01.02.85 JP 16553/85  
27.09.85 JP 214163/85  
20.12.85 JP 285576/85

(43) Date of publication of application:  
06.08.86 Bulletin 86/32

(88) Date of deferred publication of search report: 14.12.88

(84) Designated Contracting States:  
CH DE FR GB IT LI NL SE

(71) Applicant: HITACHI, LTD.  
6, Kanda Surugadai 4-chome Chiyoda-ku  
Tokyo 100(JP)

(72) Inventor: Miura, Shuuichi  
Yuhoryo 305, 20-3 Ayukawacho-6-chome  
Hitachi-shi(JP)

(72) Inventor: Kobayashi, Yoshiaki  
24-5, Mikanoharacho-2-chome  
Hitachi-shi(JP)

(72) Inventor: Fukushima, Tadashi  
23-5, Hanayamacho-1-chome  
Hitachi-shi(JP)

(72) Inventor: Okuyama, Yoshiyuki  
Tozawaryo 901, 10-12 Suehirocho-3-chome  
Hitachi-shi(JP)

(72) Inventor: Katoh, Takeshi  
5-19, Higashionumacho-2-chome  
Hitachi-shi(JP)

(72) Inventor: Hirasawa, Kotaro  
10-7, Kanesawacho-7-chome  
Hitachi-shi(JP)

(72) Inventor: Asada, Kazuyoshi  
15-8-1, Suwacho-4-chome  
Hitachi-shi(JP)

(74) Representative: Strehl, Schübel-Hopf, Groening, Schulz  
Widenmayerstrasse 17 Postfach 22 03 45  
D-8000 München 22(DE)

(54) Parallel image processor.

(57) An LSI parallel image processor in which line buffers (20-i) and data-flow switching circuits (70) each requiring a larger amount of hardware in the prior art are incorporated into an LSI circuit, the image data delayed at the line buffers (20-i) is output from an image data output port (55), shift registers (31-i) each having a variable number of steps for preserving local image regions are intermittently shifted-in in accordance with applied clocks, and the contents of the shift registers (31-i) are sequentially read out.

EP 0 189 943 A3

FIG. 1





European Patent  
Office

EUROPEAN SEARCH REPORT

0189943

Application Number

EP 86 10 1338

| DOCUMENTS CONSIDERED TO BE RELEVANT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                                                                                                                                   |                   |                                                |                 |                                  |          |           |            |              |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|------------------------------------------------|-----------------|----------------------------------|----------|-----------|------------|--------------|
| Category                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | Citation of document with indication, where appropriate, of relevant passages                                                                                                                     | Relevant to claim | CLASSIFICATION OF THE APPLICATION (Int. Cl. 4) |                 |                                  |          |           |            |              |
| Y                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | EP-A-0 118 053 (HITACHI LTD)<br>* Figures 1-8,17; pages 7,8,97-100 *                                                                                                                              | 1-10              | G 06 F 15/68                                   |                 |                                  |          |           |            |              |
| Y                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | US-A-4 167 728 (S.R. STERNBERG)<br>* Abstract; column 2 *                                                                                                                                         | 1-10              |                                                |                 |                                  |          |           |            |              |
| A                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | ELECTRONIC DESIGN, vol. 32, no. 20, 4th October 1984, pages 209-215, Waseca, Minnesota, US; T. FUKUSHIMA: "Image signal processor computes fast enough for gray-scale video"<br>* Whole article * | 1-16              |                                                |                 |                                  |          |           |            |              |
| -----                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                   |                   |                                                |                 |                                  |          |           |            |              |
| TECHNICAL FIELDS SEARCHED (Int. Cl.4)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                   |                   |                                                |                 |                                  |          |           |            |              |
| G 06 F                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                                                                                                                                                                                   |                   |                                                |                 |                                  |          |           |            |              |
| <p>The present search report has been drawn up for all claims</p> <table border="1" style="width: 100%; border-collapse: collapse;"> <tr> <td style="width: 33%;">Place of search</td> <td style="width: 33%;">Date of completion of the search</td> <td style="width: 34%;">Examiner</td> </tr> <tr> <td>THE HAGUE</td> <td>23-09-1988</td> <td>CHATEAU J.P.</td> </tr> </table> <p><b>CATEGORY OF CITED DOCUMENTS</b></p> <p>X : particularly relevant if taken alone<br/>     Y : particularly relevant if combined with another document of the same category<br/>     A : technological background<br/>     O : non-written disclosure<br/>     P : intermediate document</p> <p>T : theory or principle underlying the invention<br/>     E : earlier patent document, but published on, or after the filing date<br/>     D : document cited in the application<br/>     L : document cited for other reasons<br/>     &amp; : member of the same patent family, corresponding document</p> |                                                                                                                                                                                                   |                   |                                                | Place of search | Date of completion of the search | Examiner | THE HAGUE | 23-09-1988 | CHATEAU J.P. |
| Place of search                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Date of completion of the search                                                                                                                                                                  | Examiner          |                                                |                 |                                  |          |           |            |              |
| THE HAGUE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 23-09-1988                                                                                                                                                                                        | CHATEAU J.P.      |                                                |                 |                                  |          |           |            |              |

**THIS PAGE BLANK (USPTO)**